Prediction of protein folding rates from primary sequences using hybrid sequence representation
نویسندگان
چکیده
The ability to predict protein folding rates constitutes an important step in understanding the overall folding mechanisms. Although many of the prediction methods are structure based, successful predictions can also be obtained from the sequence. We developed a novel method called prediction of protein folding rates (PPFR), for the prediction of protein folding rates from protein sequences. PPFR implements a linear regression model for each of the mainstream folding dynamics including two-, multi-, and mixed-state proteins. The proposed method provides predictions characterized by strong correlations with the experimental folding rates, which equal 0.87 for the two- and multistate proteins and 0.82 for the mixed-state proteins, when evaluated with out-of-sample jackknife test. Based on in-sample and out-of-sample tests, the PPFR's predictions are shown to be better than most of other sequence only and structure-based predictors and complementary to the predictions of the most recent sequence-based QRSM method. We show that simultaneous incorporation of several characteristics, including the sequence, physiochemical properties of residues, and predicted secondary structure provides improved quality. This hybridized prediction model was analyzed to reveal the complementary factors that can be used in tandem to predict folding rates. We show that bigger proteins require more time for folding, higher helical and coil content and the presence of Phe, Asn, and Gln may accelerate the folding process, the inclusion of Ile, Val, Thr, and Ser may slow down the folding process, and for the two-state proteins increased beta-strand content may decelerate the folding process. Finally, PPFR provides strong correlation when predicting sequences with low similarity.
منابع مشابه
Structural Characteristics of Stable Folding Intermediates of Yeast Iso-1-Cytochrome-c
Cytochrome-c (cyt-c) is an electron transport protein, and it is present throughout the evolution. More than 280 sequences have been reported in the protein sequence database (www.uniprot.org). Though sequentially diverse, cyt-c has essentially retained its tertiary structure or fold. Thus a vast data set of varied sequences with retention of similar structure and fun...
متن کاملRelation Between RNA Sequences, Structures, and Shapes via Variation Networks
Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...
متن کاملA general method for the prediction of the three dimensional structure and folding pathway of globular proteins: Application to designed helical proteins
Starting from amino acid sequence alone, a general approach for simulating folding into the molten globule or rigid, native state depending on sequence is described. In particular, the 3D folds of two simple designed proteins have been predicted using a Monte Carlo folding algorithm. The model employs a very flexible hybrid lattice representation of the protein conformation, and fast lattice dy...
متن کاملA Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information
Prediction of protein folding rates from amino acid sequences is one of the most important challenges in molecular biology. In this work, I have related the protein folding rates with physical-chemical, energetic and conformational properties of amino acid residues. I found that the classification of proteins into different structural classes shows an excellent correlation between amino acid pr...
متن کاملPrediction of protein folding rates from the amino acid sequence-predicted secondary structure.
We present a method for predicting folding rates of proteins from their amino acid sequences only, or rather, from their chain lengths and their helicity predicted from their sequences. The method achieves 82% correlation with experiment over all 64 "two-state" and "multistate" proteins (including two artificial peptides) studied up to now.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational chemistry
دوره 30 5 شماره
صفحات -
تاریخ انتشار 2009